Pesquisa | Portal Regional da BVS

1.

The chromosome-level genome assembly of the giant dobsonfly Acanthacorydalis orientalis (McLachlan, 1899).

Zou, Mingming; Lin, Aili; Wang, Yuyu; Yang, Ding; Liu, Xingyue.

Sci Data ; 11(1): 351, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-38589366

RESUMO

Acanthacorydalis orientalis (McLachlan, 1899) (Megaloptera: Corydalidae) is an important freshwater-benthic invertebrate species that serves as an indicator for water-quality biomonitoring and is valuable for conservation from East Asia. Here, a high-quality reference genome for A. orientalis was constructed using Oxford Nanopore sequencing and High throughput Chromosome Conformation Capture (Hi-C) technology. The final genome size is 547.98 Mb, with the N50 values of contig and scaffold being 7.77 Mb and 50.53 Mb, respectively. The longest contig and scaffold are 20.57 Mb and 62.26 Mb in length, respectively. There are 99.75% contigs anchored onto 13 pseudo-chromosomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the completeness of the genome assembly is 99.01%. There are 10,977 protein-coding genes identified, of which 84.00% are functionally annotated. The genome contains 44.86% repeat sequences. This high-quality genome provides substantial data for future studies on population genetics, aquatic adaptation, and evolution of Megaloptera and other related insect groups.

Assuntos

Genoma de Inseto , Neópteros , Sequências Repetitivas de Ácido Nucleico , Cromossomos/genética , Anotação de Sequência Molecular , Filogenia , Neópteros/genética

2.

Enhanced bovine genome annotation through integration of transcriptomics and epi-transcriptomics datasets facilitates genomic biology.

Beiki, Hamid; Murdoch, Brenda M; Park, Carissa A; Kern, Chandlar; Kontechy, Denise; Becker, Gabrielle; Rincon, Gonzalo; Jiang, Honglin; Zhou, Huaijun; Thorne, Jacob; Koltes, James E; Michal, Jennifer J; Davenport, Kimberly; Rijnkels, Monique; Ross, Pablo J; Hu, Rui; Corum, Sarah; McKay, Stephanie; Smith, Timothy P L; Liu, Wansheng; Ma, Wenzhi; Zhang, Xiaohui; Xu, Xiaoqing; Han, Xuelei; Jiang, Zhihua; Hu, Zhi-Liang; Reecy, James M.

Gigascience ; 132024 Jan 02.

Artigo em Inglês | MEDLINE | ID: mdl-38626724

RESUMO

BACKGROUND: The accurate identification of the functional elements in the bovine genome is a fundamental requirement for high-quality analysis of data informing both genome biology and genomic selection. Functional annotation of the bovine genome was performed to identify a more complete catalog of transcript isoforms across bovine tissues. RESULTS: A total of 160,820 unique transcripts (50% protein coding) representing 34,882 unique genes (60% protein coding) were identified across tissues. Among them, 118,563 transcripts (73% of the total) were structurally validated by independent datasets (PacBio isoform sequencing data, Oxford Nanopore Technologies sequencing data, de novo assembled transcripts from RNA sequencing data) and comparison with Ensembl and NCBI gene sets. In addition, all transcripts were supported by extensive data from different technologies such as whole transcriptome termini site sequencing, RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression, chromatin immunoprecipitation sequencing, and assay for transposase-accessible chromatin using sequencing. A large proportion of identified transcripts (69%) were unannotated, of which 86% were produced by annotated genes and 14% by unannotated genes. A median of two 5' untranslated regions were expressed per gene. Around 50% of protein-coding genes in each tissue were bifunctional and transcribed both coding and noncoding isoforms. Furthermore, we identified 3,744 genes that functioned as noncoding genes in fetal tissues but as protein-coding genes in adult tissues. Our new bovine genome annotation extended more than 11,000 annotated gene borders compared to Ensembl or NCBI annotations. The resulting bovine transcriptome was integrated with publicly available quantitative trait loci data to study tissue-tissue interconnection involved in different traits and construct the first bovine trait similarity network. CONCLUSIONS: These validated results show significant improvement over current bovine genome annotations.

Assuntos

Perfilação da Expressão Gênica , Genômica , Bovinos/genética , Animais , Análise de Sequência de RNA , Transcriptoma , Locos de Características Quantitativas , RNA , Isoformas de Proteínas , Anotação de Sequência Molecular

3.

Earl Grey: A Fully Automated User-Friendly Transposable Element Annotation and Analysis Pipeline.

Baril, Tobias; Galbraith, James; Hayward, Alex.

Mol Biol Evol ; 41(4)2024 Apr 02.

Artigo em Inglês | MEDLINE | ID: mdl-38577785

RESUMO

Transposable elements (TEs) are major components of eukaryotic genomes and are implicated in a range of evolutionary processes. Yet, TE annotation and characterization remain challenging, particularly for nonspecialists, since existing pipelines are typically complicated to install, run, and extract data from. Current methods of automated TE annotation are also subject to issues that reduce overall quality, particularly (i) fragmented and overlapping TE annotations, leading to erroneous estimates of TE count and coverage, and (ii) repeat models represented by short sections of total TE length, with poor capture of 5' and 3' ends. To address these issues, we present Earl Grey, a fully automated TE annotation pipeline designed for user-friendly curation and annotation of TEs in eukaryotic genome assemblies. Using nine simulated genomes and an annotation of Drosophila melanogaster, we show that Earl Grey outperforms current widely used TE annotation methodologies in ameliorating the issues mentioned above while scoring highly in benchmarking for TE annotation and classification and being robust across genomic contexts. Earl Grey provides a comprehensive and fully automated TE annotation toolkit that provides researchers with paper-ready summary figures and outputs in standard formats compatible with other bioinformatics tools. Earl Grey has a modular format, with great scope for the inclusion of additional modules focused on further quality control and tailored analyses in future releases.

Assuntos

Elementos de DNA Transponíveis , Drosophila melanogaster , Animais , Elementos de DNA Transponíveis/genética , Anotação de Sequência Molecular , Drosophila melanogaster/genética , Genômica/métodos , Biologia Computacional

4.

From tradition to innovation: conventional and deep learning frameworks in genome annotation.

Chen, Zhaojia; Ain, Noor Ul; Zhao, Qian; Zhang, Xingtan.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38581418

RESUMO

Following the milestone success of the Human Genome Project, the 'Encyclopedia of DNA Elements (ENCODE)' initiative was launched in 2003 to unearth information about the numerous functional elements within the genome. This endeavor coincided with the emergence of numerous novel technologies, accompanied by the provision of vast amounts of whole-genome sequences, high-throughput data such as ChIP-Seq and RNA-Seq. Extracting biologically meaningful information from this massive dataset has become a critical aspect of many recent studies, particularly in annotating and predicting the functions of unknown genes. The core idea behind genome annotation is to identify genes and various functional elements within the genome sequence and infer their biological functions. Traditional wet-lab experimental methods still rely on extensive efforts for functional verification. However, early bioinformatics algorithms and software primarily employed shallow learning techniques; thus, the ability to characterize data and features learning was limited. With the widespread adoption of RNA-Seq technology, scientists from the biological community began to harness the potential of machine learning and deep learning approaches for gene structure prediction and functional annotation. In this context, we reviewed both conventional methods and contemporary deep learning frameworks, and highlighted novel perspectives on the challenges arising during annotation underscoring the dynamic nature of this evolving scientific landscape.

Assuntos

Aprendizado Profundo , Humanos , Genoma , Algoritmos , Software , Biologia Computacional/métodos , Anotação de Sequência Molecular

5.

KEGG orthology prediction of bacterial proteins using natural language processing.

Chen, Jing; Wu, Haoyu; Wang, Ning.

BMC Bioinformatics ; 25(1): 146, 2024 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-38600441

RESUMO

BACKGROUND: The advent of high-throughput technologies has led to an exponential increase in uncharacterized bacterial protein sequences, surpassing the capacity of manual curation. A large number of bacterial protein sequences remain unannotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology, making it necessary to use auto annotation tools. These tools are now indispensable in the biological research landscape, bridging the gap between the vastness of unannotated sequences and meaningful biological insights. RESULTS: In this work, we propose a novel pipeline for KEGG orthology annotation of bacterial protein sequences that uses natural language processing and deep learning. To assess the effectiveness of our pipeline, we conducted evaluations using the genomes of two randomly selected species from the KEGG database. In our evaluation, we obtain competitive results on precision, recall, and F1 score, with values of 0.948, 0.947, and 0.947, respectively. CONCLUSIONS: Our experimental results suggest that our pipeline demonstrates performance comparable to traditional methods and excels in identifying distant relatives with low sequence identity. This demonstrates the potential of our pipeline to significantly improve the accuracy and comprehensiveness of KEGG orthology annotation, thereby advancing our understanding of functional relationships within biological systems.

Assuntos

Proteínas de Bactérias , Processamento de Linguagem Natural , Genoma , Anotação de Sequência Molecular , Sequência de Aminoácidos

6.

A chromosome-level genome assembly of the spider mite Tetranychus piercei McGregor.

Chen, Lei; Yu, Xin-Yue; Zhang, Feng; Zhang, Hua-Meng; Guo, Li-Xue; Ren, Lu; Hong, Xiao-Yue; Sun, Jing-Tao.

Sci Data ; 11(1): 340, 2024 Apr 05.

Artigo em Inglês | MEDLINE | ID: mdl-38580722

RESUMO

Despite the rapid advances in sequencing technology, limited genomic resources are currently available for phytophagous spider mites, which include many important agricultural pests. One of these pests is Tetranychus piercei (McGregor), a serious banana pest in East Asia exhibiting remarkable tolerance to high temperature. In this study, we assembled a high-quality genome of T. piercei using a combination of PacBio long reads and Illumina short reads sequencing. With the assistance of chromatin conformation capture technology, 99.9% of the contigs were anchored into three pseudochromosomes with a total size of 86.02 Mb. Repetitive elements, accounting for 14.16% of this genome (12.20 Mb), are predominantly composed of long-terminal repeats (30.7%). By combining evidence of ab initio prediction, transcripts, and homologous proteins, we annotated 11,881 protein-coding genes. Both the genome and proteins have high BUSCO completeness scores (>94%). This high-quality genome, along with reliable annotation, provides a valuable resource for investigating the high-temperature tolerance of this species and exploring the genomic basis that underlies the host range evolution of spider mites.

Assuntos

Tetranychidae , Animais , Cromossomos , Genoma , Genômica , Anotação de Sequência Molecular , Filogenia , Sequências Repetitivas de Ácido Nucleico , Tetranychidae/genética

7.

Yak genome database: a multi-omics analysis platform.

Jiang, Hui; Chai, Zhi-Xin; Chen, Xiao-Ying; Zhang, Cheng-Fu; Zhu, Yong; Ji, Qiu-Mei; Xin, Jin-Wei.

BMC Genomics ; 25(1): 346, 2024 Apr 05.

Artigo em Inglês | MEDLINE | ID: mdl-38580907

RESUMO

BACKGROUND: The yak (Bos grunniens) is a large ruminant species that lives in high-altitude regions and exhibits excellent adaptation to the plateau environments. To further understand the genetic characteristics and adaptive mechanisms of yak, we have developed a multi-omics database of yak including genome, transcriptome, proteome, and DNA methylation data. DESCRIPTION: The Yak Genome Database ( http://yakgenomics.com/ ) integrates the research results of genome, transcriptome, proteome, and DNA methylation, and provides an integrated platform for researchers to share and exchange omics data. The database contains 26,518 genes, 62 transcriptomes, 144,309 proteome spectra, and 22,478 methylation sites of yak. The genome module provides access to yak genome sequences, gene annotations and variant information. The transcriptome module offers transcriptome data from various tissues of yak and cattle strains at different developmental stages. The proteome module presents protein profiles from diverse yak organs. Additionally, the DNA methylation module shows the DNA methylation information at each base of the whole genome. Functions of data downloading and browsing, functional gene exploration, and experimental practice were available for the database. CONCLUSION: This comprehensive database provides a valuable resource for further investigations on development, molecular mechanisms underlying high-altitude adaptation, and molecular breeding of yak.

Assuntos

Multiômica , Proteoma , Animais , Bovinos/genética , Proteoma/genética , Genoma , Transcriptoma , Anotação de Sequência Molecular

8.

Genome Assembly and Annotation of the Dark-Branded Bushbrown Butterfly Mycalesis mineus (Nymphalidae: Satyrinae).

Murugesan, Suriya Narayanan; Tian, Shen; Monteiro, Antónia.

Genome Biol Evol ; 16(3)2024 Mar 02.

Artigo em Inglês | MEDLINE | ID: mdl-38505885

RESUMO

We report a high-quality genome draft assembly of the dark-branded bushbrown, Mycalesis mineus, a member of the Satyrinae subfamily of nymphalid butterflies. This species is emerging as a promising model organism for investigating the evolution and development of phenotypic plasticity. Using 45.99âGb of long-read data (N50 = 11.11âkb), we assembled a genome size of 497.4âMb for M. mineus. The assembly is highly contiguous and nearly complete (96.8% of Benchmarking Universal Single-Copy Orthologs lepidopteran genes were complete and single copy). The genome comprises 38.71% of repetitive elements and includes 20,967 predicted protein-coding genes. The assembled genome was super-scaffolded into 28 pseudo-chromosomes using a closely related species, Bicyclus anynana, with a chromosomal-level genome as a template. This valuable genomic tool will advance both ongoing and future research focused on this model organism.

Assuntos

Borboletas , Animais , Borboletas/genética , Anotação de Sequência Molecular , Genômica , Sequências Repetitivas de Ácido Nucleico , Tamanho do Genoma , Cromossomos

9.

A high-quality chromosome-level genome assembly of the Chinese medaka Oryzias sinensis.

Dong, Zhongdian; Wang, Jiangman; Chen, Guozhu; Guo, Yusong; Zhao, Na; Wang, Zhongduo; Zhang, Bo.

Sci Data ; 11(1): 322, 2024 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-38548787

RESUMO

Oryzias sinensis, also known as Chinese medaka or Chinese ricefish, is a commonly used animal model for aquatic environmental assessment in the wild as well as gene function validation or toxicology research in the lab. Here, a high-quality chromosome-level genome assembly of O. sinensis was generated using single-tube long fragment read (stLFR) reads, Nanopore long-reads, and Hi-C sequencing data. The genome is 796.58 Mb, and a total of 712.17 Mb of the assembled sequences were anchored to 23 pseudo-chromosomes. A final set of 22,461 genes were annotated, with 98.67% being functionally annotated. The Benchmarking Universal Single-Copy Orthologs (BUSCO) benchmark of genome assembly and gene annotation reached 95.1% (93.3% single-copy) and 94.6% (91.7% single-copy), respectively. Furthermore, we also use ATAC-seq to uncover chromosome transposase-accessibility as well as related genome area function enrichment for Oryzias sinensis. This study offers a new improved foundation for future genomics research in Chinese medaka.

Assuntos

Oryzias , Animais , Cromossomos/genética , Genoma , Genômica , Anotação de Sequência Molecular , Oryzias/genética , Filogenia

10.

PLMSearch: Protein language model powers accurate and fast sequence search for remote homology.

Liu, Wei; Wang, Ziye; You, Ronghui; Xie, Chenghan; Wei, Hong; Xiong, Yi; Yang, Jianyi; Zhu, Shanfeng.

Nat Commun ; 15(1): 2775, 2024 Mar 30.

Artigo em Inglês | MEDLINE | ID: mdl-38555371

RESUMO

Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Protein Language Model), a homologous protein search method with only sequences as input. PLMSearch uses deep representations from a pre-trained protein language model and trains the similarity prediction model with a large number of real structure similarity. This enables PLMSearch to capture the remote homology information concealed behind the sequences. Extensive experimental results show that PLMSearch can search millions of query-target protein pairs in seconds like MMseqs2 while increasing the sensitivity by more than threefold, and is comparable to state-of-the-art structure search methods. In particular, unlike traditional sequence search methods, PLMSearch can recall most remote homology pairs with dissimilar sequences but similar structures. PLMSearch is freely available at https://dmiip.sjtu.edu.cn/PLMSearch .

Assuntos

Evolução Biológica , Proteínas , Proteínas/química , Anotação de Sequência Molecular , Algoritmos , Análise de Sequência de Proteína

11.

Chromosome-Level Assembly and Annotation of the Pearly Heath Coenonympha arcania Butterfly Genome.

Legeai, Fabrice; Romain, Sandra; Capblancq, Thibaut; Doniol-Valcroze, Paul; Joron, Mathieu; Lemaitre, Claire; Després, Laurence.

Genome Biol Evol ; 16(3)2024 Mar 02.

Artigo em Inglês | MEDLINE | ID: mdl-38491969

RESUMO

We present the first chromosome-level genome assembly and annotation of the pearly heath Coenonympha arcania, generated with a PacBio HiFi sequencing approach and complemented with Hi-C data. We additionally compare synteny, gene, and repeat content between C. arcania and other Lepidopteran genomes. This reference genome will enable future population genomics studies with Coenonympha butterflies, a species-rich genus that encompasses some of the most highly endangered butterfly taxa in Europe.

Assuntos

Borboletas , Animais , Borboletas/genética , Genoma , Cromossomos/genética , Sintenia , Europa (Continente) , Anotação de Sequência Molecular

12.

The genome of the rayed Mediterranean limpet Patella caerulea (Linnaeus, 1758).

Halstead-Nussloch, Gwyneth; Signorini, Silvia Giorgia; Giulio, Marco; Crocetta, Fabio; Munari, Marco; Della Torre, Camilla; Weber, Alexandra Anh-Thu.

Genome Biol Evol ; 16(4)2024 Apr 02.

Artigo em Inglês | MEDLINE | ID: mdl-38546725

RESUMO

Patella caerulea (Linnaeus, 1758) is a mollusc limpet species of the class Gastropoda. Endemic to the Mediterranean Sea, it is considered a keystone species due to its primary role in structuring and regulating the ecological balance of tidal and subtidal habitats. It is currently being used as a bioindicator to assess the environmental quality of coastal marine waters and as a model species to understand adaptation to ocean acidification. Here, we provide a high-quality reference genome assembly and annotation for P. caerulea. We generated â¼30âGb of Pacific Biosciences high-fidelity data from a single individual and provide a final 749.8âMb assembly containing 62 contigs, including the mitochondrial genome (14,938âbp). With an N50 of 48.8âMb and 98% of the assembly contained in the 18 largest contigs, this assembly is near chromosome-scale. Benchmarking Universal Single-Copy Orthologs scores were high (Mollusca, 87.8% complete; Metazoa, 97.2% complete) and similar to metrics observed for other chromosome-level Patella genomes, highlighting a possible bias in the Mollusca database for Patellids. We generated transcriptomic Illumina data from a second individual collected at the same locality and used it together with protein evidence to annotate the genome. A total of 23,938 protein-coding gene models were found. By comparing this annotation with other published Patella annotations, we found that the distribution and median values of exon and gene lengths was comparable with other Patella species despite different annotation approaches. The present high-quality P. caerulea reference genome, available on GenBank (BioProject: PRJNA1045377; assembly: GCA_036850965.1), is an important resource for future ecological and evolutionary studies.

Assuntos

Gastrópodes , Patela , Animais , Concentração de Íons de Hidrogênio , Anotação de Sequência Molecular , Água do Mar , Moluscos/genética , Cromossomos , Gastrópodes/genética

13.

Enriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissues.

Degalez, Fabien; Charles, Mathieu; Foissac, Sylvain; Zhou, Haijuan; Guan, Dailu; Fang, Lingzhao; Klopp, Christophe; Allain, Coralie; Lagoutte, Laetitia; Lecerf, Frédéric; Acloque, Hervé; Giuffra, Elisabetta; Pitel, Frédérique; Lagarrigue, Sandrine.

Sci Rep ; 14(1): 6588, 2024 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-38504112

RESUMO

Gene atlases for livestock are steadily improving thanks to new genome assemblies and new expression data improving the gene annotation. However, gene content varies across databases due to differences in RNA sequencing data and bioinformatics pipelines, especially for long non-coding RNAs (lncRNAs) which have higher tissue and developmental specificity and are harder to consistently identify compared to protein coding genes (PCGs). As done previously in 2020 for chicken assemblies galgal5 and GRCg6a, we provide a new gene atlas, lncRNA-enriched, for the latest GRCg7b chicken assembly, integrating "NCBI RefSeq", "EMBL-EBI Ensembl/GENCODE" reference annotations and other resources such as FAANG and NONCODE. As a result, the number of PCGs increases from 18,022 (RefSeq) and 17,007 (Ensembl) to 24,102, and that of lncRNAs from 5789 (RefSeq) and 11,944 (Ensembl) to 44,428. Using 1400 public RNA-seq transcriptome representing 47 tissues, we provided expression evidence for 35,257 (79%) lncRNAs and 22,468 (93%) PCGs, supporting the relevance of this atlas. Further characterization including tissue-specificity, sex-differential expression and gene configurations are provided. We also identified conserved miRNA-hosting genes with human counterparts, suggesting common function. The annotated atlas is available at gega.sigenae.org.

Assuntos

RNA Longo não Codificante , Animais , Humanos , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Galinhas/genética , Galinhas/metabolismo , Transcriptoma , Anotação de Sequência Molecular , Análise de Sequência de RNA

14.

Contigs directed gene annotation (ConDiGA) for accurate protein sequence database construction in metaproteomics.

Wu, Enhui; Mallawaarachchi, Vijini; Zhao, Jinzhi; Yang, Yi; Liu, Hebin; Wang, Xiaoqing; Shen, Chengpin; Lin, Yu; Qiao, Liang.

Microbiome ; 12(1): 58, 2024 Mar 19.

Artigo em Inglês | MEDLINE | ID: mdl-38504332

RESUMO

BACKGROUND: Microbiota are closely associated with human health and disease. Metaproteomics can provide a direct means to identify microbial proteins in microbiota for compositional and functional characterization. However, in-depth and accurate metaproteomics is still limited due to the extreme complexity and high diversity of microbiota samples. It is generally recommended to use metagenomic data from the same samples to construct the protein sequence database for metaproteomic data analysis. Although different metagenomics-based database construction strategies have been developed, an optimization of gene taxonomic annotation has not been reported, which, however, is extremely important for accurate metaproteomic analysis. RESULTS: Herein, we proposed an accurate taxonomic annotation pipeline for genes from metagenomic data, namely contigs directed gene annotation (ConDiGA), and used the method to build a protein sequence database for metaproteomic analysis. We compared our pipeline (ConDiGA or MD3) with two other popular annotation pipelines (MD1 and MD2). In MD1, genes were directly annotated against the whole bacterial genome database; in MD2, contigs were annotated against the whole bacterial genome database and the taxonomic information of contigs was assigned to the genes; in MD3, the most confident species from the contigs annotation results were taken as reference to annotate genes. Annotation tools, including BLAST, Kaiju, and Kraken2, were compared. Based on a synthetic microbial community of 12 species, it was found that Kaiju with the MD3 pipeline outperformed the others in the construction of protein sequence database from metagenomic data. Similar performance was also observed with a fecal sample, as well as in silico mixed datasets of the simulated microbial community and the fecal sample. CONCLUSIONS: Overall, we developed an optimized pipeline for gene taxonomic annotation to construct protein sequence databases. Our study can tackle the current taxonomic annotation reliability problem in metagenomics-derived protein sequence database and can promote the in-depth metaproteomic analysis of microbiome. The unique metagenomic and metaproteomic datasets of the 12 bacterial species are publicly available as a standard benchmarking sample for evaluating various analysis pipelines. The code of ConDiGA is open access at GitHub for the analysis of microbiota samples. Video Abstract.

Assuntos

Microbiota , Humanos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Reprodutibilidade dos Testes , Microbiota/genética , Metagenoma/genética , Bactérias/genética , Metagenômica/métodos

15.

An historical "wreck": A transcriptome assembly of the naval shipworm, Teredo navalis Linnaeus, 1978.

Gomes-Dos-Santos, André; Domingues, Marcos; Ruivo, Raquel; Fonseca, Elza; Froufe, Elsa; Deyanova, Diana; Franco, João N; C Castro, L Filipe.

Mar Genomics ; 74: 101097, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38485291

RESUMO

Historically famous for their negative impact on human-built marine wood structures, mollusc shipworms play a central ecological role in marine ecosystems. Their association with bacterial symbionts, providing cellulolytic and nitrogen-fixing activities, underscores their exceptional wood-eating and wood-boring behaviours, improving energy transfer and the recycling of essential nutrients locked in the wood cellulose. Importantly, from a molecular standpoint, a minute of omic resources are available from this lineage of Bivalvia. Here, we produced and assembled a transcriptome from the globally distributed naval shipworm, Teredo navalis (family Teredinidae). The transcriptome was obtained by sequencing the total RNA from five equidistant segments of the whole body of a T. navalis specimen. The quality of the produced assembly was accessed with several statistics, revealing a highly contiguous (1194 N50) and complete (over 90% BUSCO scores for Eukaryote and Metazoan databases) transcriptome, with nearly 38,000 predicted ORF, more than half being functionally annotated. Our findings pave the way to investigate the unique evolutionary biology of these highly modified bivalves and lay the foundation for an adequate gene annotation of a full genome sequence of the species.

Assuntos

Bivalves , Ecossistema , Humanos , Animais , Transcriptoma , Bivalves/genética , Evolução Biológica , Madeira , Anotação de Sequência Molecular

16.

A chromosome-level genome assembly of the pig-nosed turtle (Carettochelys insculpta).

Li, Ye; Liu, Yuxuan; Zheng, Jiangmin; Wu, Baosheng; Cui, Xinxin; Xu, Wenjie; Zhu, Chenglong; Qiu, Qiang; Wang, Kun.

Sci Data ; 11(1): 311, 2024 Mar 23.

Artigo em Inglês | MEDLINE | ID: mdl-38521795

RESUMO

The pig-nosed turtle (Carettochelys insculpta) represents the only extant species within the Carettochelyidae family, is a unique Trionychia member fully adapted to aquatic life and currently facing endangerment. To enhance our understanding of this species and contribute to its conservation efforts, we employed high-fidelity (HiFi) and Hi-C sequencing technology to generate its genome assembly at the chromosome level. The assembly result spans 2.18 Gb, with a contig N50 of 126 Mb, encompassing 34 chromosomes that account for 99.6% of the genome. The assembly has a BUSCO score above 95% with different databases and strong collinearity with Yangtze giant softshell turtles (Rafetus swinhoei), indicating its completeness and continuity. A total of 19,175 genes and 46.86% repetitive sequences were annotated. The availability of this chromosome-scale genome represents a valuable resource for the pig-nosed turtle, providing insights into its aquatic adaptation and serving as a foundation for future turtle research.

Assuntos

Genoma , Tartarugas , Animais , Cromossomos/genética , Anotação de Sequência Molecular , Filogenia , Sequências Repetitivas de Ácido Nucleico , Tartarugas/genética

17.

Partial order relation-based gene ontology embedding improves protein function prediction.

Li, Wenjing; Wang, Bin; Dai, Jin; Kou, Yan; Chen, Xiaojun; Pan, Yi; Hu, Shuangwei; Xu, Zhenjiang Zech.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38446740

RESUMO

Protein annotation has long been a challenging task in computational biology. Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG). In this study, we propose a novel GO term representation learning method, PO2Vec, to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.

Assuntos

Benchmarking , Biologia Computacional , Ontologia Genética , Aprendizagem , Anotação de Sequência Molecular

18.

Genome Sequence Analysis of Calcifying Bacteria Bacillus paranthracis CT5 and Its Biomineralization Efficacy to Improve the Strength and Durability Properties of Civil Structures.

Sharma, Bhavdeep; Sharma, Shruti; Medicherla, Krishna M; Reddy, Sudhakara M.

Curr Microbiol ; 81(5): 109, 2024 Mar 11.

Artigo em Inglês | MEDLINE | ID: mdl-38466427

RESUMO

Bacteria producing urea amidohydrolases (UA) and carbonic anhydrases (CA) are of great importance in civil engineering as these enzymes are responsible for microbially induced calcium carbonate precipitation (MICCP). In this investigation, genomic insights of Bacillus paranthracis CT5 and the expression of genes underlying in MICCP were studied. B. paranthracis produced a maximum level of UA (669.3 U/ml) and CA (125 U/ml) on 5th day of incubation and precipitated 197 mg/100 ml CaCO3 after 7 days of incubation. After 28 days of curing, compressive strength of bacterial admixed and bacterial cured (B-B) specimens was 13.7% higher compared to water-mixed and water-cured (W-W) specimens. A significant decrease in water absorption was observed in bacterial-cured specimens compared to water-cured specimens after 28 days of curing. For genome analysis, reads were assembled de novo producing 5,402,771 bp assembly with N50 of 273,050 bp. RAST annotation detected six amidohydrolase and three carbonic anhydrase genes. Among 5700 coding sequences found in genome, COG gene annotation grouped 4360 genes into COG categories with highest number of genes to transcription (435 genes), amino acid transport and metabolism (362 genes) along with cell wall/membrane/envelope biogenesis and ion transport and metabolism. KEGG functional classification predicted 223 pathways consisting of 1,960 genes and the highest number of genes belongs to two-component system (101 genes) and ABC transporter pathways (98 genes) enabling bacteria to sense and respond to environmental signals and actively transport various minerals and organic molecules, which facilitate the active transport of molecules required for MICCP.

Assuntos

Bacillus , Biomineralização , Anidrases Carbônicas , Bactérias/metabolismo , Carbonato de Cálcio/química , Anidrases Carbônicas/genética , Anidrases Carbônicas/metabolismo , Anotação de Sequência Molecular , Água/metabolismo , Urease

19.

A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats.

de Jong, Tristan V; Pan, Yanchao; Rastas, Pasi; Munro, Daniel; Tutaj, Monika; Akil, Huda; Benner, Chris; Chen, Denghui; Chitre, Apurva S; Chow, William; Colonna, Vincenza; Dalgard, Clifton L; Demos, Wendy M; Doris, Peter A; Garrison, Erik; Geurts, Aron M; Gunturkun, Hakan M; Guryev, Victor; Hourlier, Thibaut; Howe, Kerstin; Huang, Jun; Kalbfleisch, Ted; Kim, Panjun; Li, Ling; Mahaffey, Spencer; Martin, Fergal J; Mohammadi, Pejman; Ozel, Ayse Bilge; Polesskaya, Oksana; Pravenec, Michal; Prins, Pjotr; Sebat, Jonathan; Smith, Jennifer R; Solberg Woods, Leah C; Tabakoff, Boris; Tracey, Alan; Uliano-Silva, Marcela; Villani, Flavia; Wang, Hongyang; Sharp, Burt M; Telese, Francesca; Jiang, Zhihua; Saba, Laura; Wang, Xusheng; Murphy, Terence D; Palmer, Abraham A; Kwitek, Anne E; Dwinell, Melinda R; Williams, Robert W; Li, Jun Z.

Cell Genom ; 4(4): 100527, 2024 Apr 10.

Artigo em Inglês | MEDLINE | ID: mdl-38537634

RESUMO

The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared with its predecessor. Gene annotations are now more complete, improving the mapping precision of genomic, transcriptomic, and proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined â¼20.0 million sequence variations, of which 18,700 are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.

Assuntos

Genoma , Genômica , Ratos , Animais , Genoma/genética , Anotação de Sequência Molecular , Sequenciamento Completo do Genoma , Variação Genética/genética

20.

A chromosome-level genome assembly of East Asia endemic minnow Zacco platypus.

Xu, Xiaojun; Chen, Jing; Guan, Wenzhi; Niu, Baolong; Yi, Shaokui; Lou, Bao.

Sci Data ; 11(1): 317, 2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38538602

RESUMO

Zacco platypus is an endemic colorful freshwater minnow that is intensively distributed in East Asia. In this study, two adult female individuals collected from Haihe River basin were used for karyotypic study and genome sequencing, respectively. The karyotype formula of Z. platypus is 2N = 48 = 18 M + 24SM/ST + 6 T. We used PacBio long-read sequencing and Hi-C technology to assemble a chromosome-level genome of Z. platypus. As a result, an 814.87 Mb genome was assembled with the PacBio long reads. Subsequently, 98.64% assembled sequences were anchored into 24 chromosomes based on the Hi-C data. The chromosome-level assembly contained 54 scaffolds with a N50 length of 32.32 Mb. Repeat elements accounted for 52.35% in genome, and 24,779 protein-coding genes were predicted, with 92.11% were functionally annotated with the public databases. BUSCO analysis yielded a completeness score of 96.5%. This high-quality genome assembly provides valuable resources for future functional genomic research, comparative genomics, and evolutionary studies of genus Zacco.

Assuntos

Cyprinidae , Animais , Feminino , Ásia Oriental , Cromossomos/genética , Cyprinidae/genética , Genômica , Anotação de Sequência Molecular , Filogenia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA